Split sample sucessively into smaller subsamples by answering “yes-no” questions.
Each question is based on a single variable: is it above a threshold?
Split a specified number of times (depth). Final subsamples are called leaves.
The prediction for each observation is the mean of the training observations that end up in the same leaf.
Variable to split on and threshold are chosen each time to minimize the SSE after the split.
Illustration
Random forest
Multiple trees fit to random data
Data for each tree is a bootstrapped sample:
random selection of rows (with replacement)
same size as original sample
Prediction is average of predictions of the trees
Hyperparameters = number of trees and depth of trees
Gradient boosting
Multiple trees
First tree fit to data
Second tree fit to errors from first tree
Third tree fit to errors from second tree, …
Prediction is sum of predictions
Hyperparameters = number of trees and depth of trees
Neural networks
Multi-layer perceptrons
A multi-layer perceptron (MLP) consists of “neurons” arranged in layers.
A neuron is a mathematical function. It takes inputs \(x_1, \ldots, x_n\), calculates a function \(y=f(x_1, \ldots, x_n)\) and passes \(y\) to the neurons in the next level.
The inputs in the first layer are the predictors.
The inputs in successive layers are the calculations from the prior level.
The last layer is a single neuron that produces the output.
Illustration
input is \(x \in \mathbb{R}^4\)
functions \(f_1, \ldots, f_5\) of \(x\) are calculated (called “hidden layer”)
output is \(g(f_1(x), \ldots, f_5(x))\)
Rectified linear units
The usual function for the neurons (except in the last layer) is \[ y = \max(0,b+w_1x_1 + \cdots + w_nx_n)\] Parameters \(b\) (called bias) and \(w_1, \ldots w_n\) (called weights) are different for different neurons.
This function is called a rectified linear unit (RLU).
Last layer uses a linear function \[ y = b+w_1x_1 + \cdots + w_nx_n\]
Analogy to neurons firing
If \(w_i>0\) and \(b<0\), then \(y>0\) only when \(x_i\) are large enough.
A neuron fires when it is sufficiently stimulated by signals from other neurons (in prior layer).
Deep learning
Deep learning means a neural network with many layers.
Deep learning is behind facial recognition, self-driving cars, …
Need specialized library, probably TensorFlow (from Google) or PyTorch (from Facebook)
And probably need a graphical processing unit (GPU) – i.e., run on a video card